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Abstract 

We propose a robust and efficient approach to the problem of compressive phase 
retrieval in which the goal is to reconstruct a sparse vector from the magnitude 
of a number of its linear measurements. The proposed framework relies on con¬ 
strained sensing vectors and a two-stage reconstruction method that consists of 
two standard convex programs that are solved sequentially. 

In recent years, various methods are proposed for compressive phase retrieval, but 
they have suboptimal sample complexity or lack robustness guarantees. The main 
obstacle has been that there is no straightforward convex relaxations for the type 
of structure in the target. Given a set of underdetermined measurements, there is a 
standard framework for recovering a sparse matrix, and a standard framework for 
recovering a low-rank matrix. However, a general, efficient method for recovering 
a jointly sparse and low-rank matrix has remained elusive. 

Deviating from the models with generic measurements, in this paper we show that 
if the sensing vectors are chosen at random from an incoherent subspace, then the 
low-rank and sparse structures of the target signal can be effectively decoupled. 
We show that a recovery algorithm that consists of a low-rank recovery stage fol¬ 
lowed by a sparse recovery stage will produce an accurate estimate of the target 
when the number of measurements is 0 (k log f ), where k and d denote the spar¬ 
sity level and the dimension of the input signal. We also evaluate the algorithm 
through numerical simulation. 


1 Introduction 

1.1 Problem setting 

The problem of Compressive Phase Retrieval (CPR) is generally stated as the problem of estimating 
a /.'-sparse vector x * £ W 1 from noisy measurements of the form 

yi = \(a u x*)\ 2 +Zi (1) 

for i = 1.2,..., n, where a, is the sensing vector and z, denotes the additive noise. In this paper, 
we study the CPR problem with specific sensing vectors a; of the form 

a 4 = ^ T w u (2) 

where \P £ M m x d and £ W" are known. In words, the measurement vectors live in a fixed 
low-dimensional subspace (i.e, the row space of \P). These types of measurements can be applied in 
imaging systems that have control over how the scene is illuminated; examples include systems that 
use structured illumination with a spatial light modulator or a scattering medium [1,2], 
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By a standard lifting of the signal x* to X* = x*x* T , the quadratic measurements (1) can be 
expressed as 

y t = (a.jdj, X*) + Zi = (vTwiivJ#, X *) + (3) 

With the linear operator W and A defined as 

[(w iW J, B)] " =1 and A : X ^ W (vX# 7 ^ , 

we can write the measurements compactly as 

y = A(X*) + z. 

Our goal is to estimate the sparse, rank-one, and positive semidefinite matrix X* from the measure¬ 
ments (3), which also solves the CPR problem and provides an estimate for the sparse signal x* up 
to the inevitable global phase ambiguity. 

Assumptions We make the following assumptions throughout the paper. 

Al. The vectors Wi are independent and have the standard Gaussian distribution on R m : w r ~ 
N(0,I). 

A2. The matrix S' is a restricted isometry matrix for 2/c-sparse vectors and for a constant b 2 fc € 
[0,1]. Namely, it obeys 

(l-&k)ll*ll2< N|2<(l + WHl2. ( 4 ) 

for all 2fc-sparse vectors x £ M. d . 

A3. The noise vector z is bounded as ||z|| 2 < £■ 

As will be seen in Theorem 1 and its proof below, the Gaussian distribution imposed by the assump¬ 
tion Al will be used merely to guarantee successful estimation of a rank-one matrix through trace 
norm minimization. However, other distributions (e.g., uniform distribution on the unit sphere) can 
also be used to obtain similar guarantees. Furthermore, the restricted isometry condition imposed 
by the assumption A2 is not critical and can be replaced by weaker assumptions. However, the guar¬ 
antees obtained under these weaker assumptions usually require more intricate derivations, provide 
weaker noise robustness, and often do not hold uniformly for all potential target signals. Therefore, 
to keep the exposition simple and straightforward we assume (4) which is known to hold (with high 
probability) for various ensembles of random matrices (e.g., Gaussian, Rademacher, partial Fourier, 
etc). Because in many scenarios we have the flexibility of selecting S', the assumption (4) is realistic 
as well. 

Notation Let us first set the notation used throughout the paper. Matrices and vectors are denoted 
by bold capital and small letters, respectively. The set of positive integers less than or equal to 
n is denoted by [n]. The notation / = 0 {g) is used when f = eg for some absolute constant 
c > 0. For any matrix M, the Frobenius norm, the nuclear norm, the entrywise £i-norm, and the 
largest entrywise absolute value of the entries are denoted by ||iVf || F , || AtT||||A/’|| 1 , and \\M || , 

respectively. To indicate that a matrix M is positive semidefinite we write M ip 0. 

1.2 Contributions 

The main challenge in the CPR problem in its general formulation is to design an accurate estimator 
that has optimal sample complexity and computationally tractable. In this paper we address this 
challenge in the special setting where the sensing vectors can be factored as (2). Namely, we propose 
an algorithm that 

• provably produces an accurate estimate of the lifted target X * from only n = 0 (k log jr) 
measurements, and 

• can be computed in polynomial time through efficient convex optimization methods. 
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1.3 Related work 


Several papers including [3, 4, 5, 6, 7] have already studied the application of convex programming 
for (non-sparse) phase retrieval (PR) in various settings and have established estimation accuracy 
through different mathematical techniques. These phase retrieval methods attain nearly optimal 
sample complexities that scales with the dimension of the target signal up to a constant factor [4, 5, 6] 
or at most a logarithmic factor [3]. However, to the best of our knowledge, the exiting methods for 
CPR either lack accuracy and robustness guarantees or have suboptimal sample complexities. 

The problem of recovering a sparse signal from the magnitude of its subsampled Fourier transforms 
is cast in [8] as an l \-minimization with non-convex constraints. While [8] shows that a sufficient 
number of measurements would grow quadratically in k (i.e., the sparsity of the signal), the numer¬ 
ical simulations suggest that the non-convex method successfully estimates the sparse signal with 
only about k log | measurements. Another non-convex approach to CPR is considered in [9] which 
poses the problem as finding a /. -sparse vector that minimizes the residual error that takes a quartic 
form. A local search algorithm called GESPAR [10] is then applied to (approximate) the solution 
to the formulated sparsity-constrained optimization. This approach is shown to be effective through 
simulations, but it also lacks global convergence or statistical accuracy guarantees. An alternating 
minimization method for both PR and CPR is studied in [11]. This method is appealing in large 
scale problems because of computationally inexpensive iterations. More importantly, [11] proposes 
a specific initialization using which the alternating minimization method is shown to converge lin¬ 
early in noise-free PR and CPR. However, the number of measurements required to establish this 
convergence is effectively quadratic in k. In [12] and [13] the l\ -regularized form of the trace 
minimization 

argmin trace (X) + A 

x^o (5) 

subject to A ( X ) = y 

is proposed for the CPR problem. The guarantees of [13] are based on the restricted isometry prop¬ 
erty of the sensing operator X K > \(a,ri', X ) ]for sparse matrices. In [12], however, the anal¬ 
ysis is based on construction of a dual certificate through an adaptation of the golfing scheme [14], 
Assuming standard Gaussian sensing vectors a, and with appropriate choice of the regularization 
parameter A, it is shown in [12] that (5) solves the CPR when n = 0 (/c 2 logd). Furthermore, this 
method fails to recover the target sparse and rank-one matrix if n is dominated by k 2 . Estimation 
of simultaneously structured matrices through convex relaxations similar to (5) is also studied in 
[15] where it is shown that these methods do not attain optimal sample complexity. More recently, 
assuming that the sparse target has a Bernoulli-Gaussian distribution, a generalized approximate 
message passing framework is proposed in [16] to solve the CPR problem. Performance of this 
method is evaluated through numerical simulations for standard Gaussian sensing matrices which 
show the empirical phase transition for successful estimation occurs at n = 0 (fc log and also 
the algorithms can have a significantly lower runtime compared to some of the competing algo¬ 
rithms including GESPAR [10] and CPRL [13]. The PhaseCode algorithm is proposed in [17] to 
solve the CPR problem with sensing vectors designed using sparse graphs and techniques adapted 
from coding theory. Although PhaseCode is shown to achieve the optimal sample complexity, it 
lacks robustness guarantees. 

While preparing the final version of the current paper, we became aware of [18] which has indepen¬ 
dently proposed an approach similar to ours to address the CPR problem. 


2 Main Results 

2.1 Algorithm 

We propose a two-stage algorithm outlined in Algorithm 1 . Each stage of the algorithm is a convex 
program for which various efficient numerical solvers exists. In the first stage we solve (6) to obtain 
a low-rank matrix B which is an estimator of the matrix 

B* = ^X*^ T . 
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Then B is used in the second stage of the algorithm as the measurements for a sparse estimation 
expressed by (7). The constraint of (7) depends on an absolute constant C > 0 that should be 
sufficiently large. 


Algorithm 1: 




input : the measurements y, the operator W, and the matrix P 
output: the estimate X 

1 Low-rank estimation stage: 

B £ argmin trace (B) 

B)p 0 

( 6 ) 

subject to 

2 Sparse estimation stage: 

X £ argmin 

X 

\\W(B)-y\\ 2 <e 

*lli 




Ce 

(7) 

subject to 

PXP T B 

f — sfn 



Post-processing. The result of the low-rank estimation stage ( 6 ) is generally not rank-one. Simi¬ 
larly, the sparse estimation stage does not necessarily produce a X that is k x /c-sparse (i.e., it has 
at most k nonzero rows and columns) and rank-one. In fact, since we have not imposed the posi¬ 
tive semidefiniteness constraint (i.e., X 0) in (7), the estimate X is not even guaranteed to be 
positive semidefinite (PSD). However, we can enforce the rank-one or the sparsity structure in post¬ 
processing steps simply by projecting the produced estimate on the set of rank-one or k x fc-sparse 
PSD matrices. The simple but important observation is that projecting X onto the desired sets at 
most doubles the estimation error. This fact is shown by Lemma 2 in Section 4 in a general setting. 

Alternatives. There are alternative convex relaxations for the low-rank estimation and the sparse 
estimation stages of Algorithm (1). For example, ( 6 ) can be replaced by its regularized least squares 
analog 

B £ argmin ^ ||VV (B) — y \\ 2 2 + A ||B||* , 

for an appropriate choice of the regularization parameter A. Similarly, instead of (7) we can use 
an l \-regularized least squares. Furthermore, to perform the low-rank estimation and the sparse 
estimation we can use non-convex greedy type algorithms that typically have lower computational 
costs. For example, the low-rank estimation stage can be performed via the Wirtinger flow method 
proposed in [19]. Furthermore, various greedy compressive sensing algorithms such as the Iterative 
Hard Thresholding [20] and CoSaMP [21] can be used to solve the desired sparse estimation. To 
guarantee the accuracy of these compressive sensing algorithms, however, we might need to adjust 
the assumption A2 to have the restricted isometry property for cfc-sparse vectors with c being some 
small positive integer. 


2.2 Accuracy guarantees 


The following theorem shows that any solution of the proposed algorithm is an accurate estimator 
of X*. 


Theorem 1. Suppose that the assumptions Al, A2, and A3 hold with a sufficiently small constant 
82 k■ Then, there exist positive absolute constants C\, C 2 , and C 3 such that if 

n > Cirri, ( 8 ) 


then any estimate X of the Algorithm 1 obeys 


X -X* 


< 


C 2 £ 
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for all rank-one and k x k-sparse matrices X* 0 with probability exceeding 1 — e C3,n . 


The proof of Theorem 1 is straightforward and is provided in Section 4. The main idea is first to 
show the low-rank estimation stage produces an accurate estimate of B*. Because this stage can 
be viewed as a standard phase retrieval through lifting, we can simply use accuracy guarantees that 
are already established in the literature (e.g., [3, 6 , 5]). In particular, we use [5, Theorem 2] which 
established an error bound that holds uniformly for all valid B*. Thus we can ensure that X * is 
feasible in the sparse estimation stage. Then the accuracy of the sparse estimation stage can also be 
established by a simple adaptation of the analyses based on the restricted isometry property such as 
[ 22 ], 


The dependence of n (i.e., the number of measurements) and k (i.e., the sparsity of the signal) is 
not explicit in Theorem 1 . This dependence is absorbed in m which must be sufficiently large for 
Assumption A2 to hold. Considering a Gaussian matrix P, the following corollary gives a concrete 
example where the dependence of non k through m is exposed. 

Corollary 1. Suppose that the assumptions of Theorem 1 including ( 8 ) hold. Furthermore, suppose 
that P is a Gaussian matrix with iid N (0, entries and 


m > C\k log 


d 

k’ 


for some absolute constant c\ > 0. Then any estimate X produced by Algorithm 1 obeys 


X - X* 


F 



(9) 


for all rank-one and k x k-sparse matrices X* )>= 0 with probability exceeding 1 — 3e C2Tn for some 
constant C 2 > 0. 


Proof It is well-known that if P has iid N (0, and we have (9) then (4) holds with high prob¬ 
ability. For example, using a standard covering argument and a union bound [23] shows that if 
(9) holds for a sufficiently large constant Ci > 0 then we have (4) for a sufficiently small con¬ 
stant 62 k with probability exceeding 1 — 2 e~ cm for some constant c > 0 that depends only 
on 82 k■ Therefore, Theorem 1 yields the desired result which holds with probability exceeding 
1 — 2e~ cm — e~ Cd,n > 1 — 3e -C2m for some constant C 2 > 0 depending only on 82 k- □ 


3 Numerical Experiments 

We evaluated the performance of Algorithm 1 through some numerical simulations. The low-rank 
estimation stage and the sparse estimation stage are implemented using the TFOCS package [24], 
We considered the target fc-sparse signal x* to be in M 256 (i.e., d = 256). The support set of 
of the target signal is selected uniformly at random and the entry values on this support are drawn 
independently from N (0,1). The noise vector 2 is also Gaussian with independent N (0,10” 4 ). The 
operator W and the matrix P are drawn from some Gaussian ensembles as described in Corollary 

II X — X* II 

1. We measured the relative error || X< || F of achieved by the compared methods over 100 trials 
with sparsity level (i.e., k ) varying in the set { 2 ,4, 6 ,..., 20 }. 

In the first experiment, for each value of /.;, the pair (m, n) that determines the size W and P are 
selected from {(8k, 24 k), (8k, 32 k), (12 k, 36 k), (12 k, 48 k), (16fc, 48fc)}. Figure 1 illustrates the 
0.9 quantiles of the relative error versus k for the mentioned choices of m. 

In the second experiment we compared the performance of Algorithm 1 to the convex optimization 
methods that do not exploit the structure of the sensing vectors. The setup for this experiment is the 
same as in the first experiment except for the size of W and P ; we chose m = \2k(\ + log i )] and 
n = 3to, where [r] denotes the smallest integer greater than r. Figure 2 illustrates the 0.9 quantiles 
of the measured relative errors for Algorithm 1, the semidefinite program (5) for A = 0 and A = 0.2, 
and the £1 -minimization 

argmin WX^ 

x 

subject to A (X) = y, 
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Figure 1: The empirical 0.9 quantile of the relative estimation error vs. sparsity for various choices 
of in and n with d = 256. 


-O-2-stage -< - SDP SDP+fi t x 



Figure 2: The empirical 0.9 quantile of the relative estimation error vs. sparsity for Algorithm 1 
and different trace- and/or l x - minimization methods with d = 256, m = |~2fc (l + log ^)], and 
n = 3 to. 


which are denoted by 2-stage, SDP, SDP+fi, and l x , respectively. The SDP-based method did not 
perform significantly different for other values of A in our complementary simulations. The relative 
error for each trial is also overlaid in Figure 2 visualize its empirical distribution. The empirical 
performance of the algorithms are in agreement with the theoretical results. Namely in a regime 
where n = 0 (to) = 0 (fc log f), Algorithm 1 can produce accurate estimates whereas while the 
other approaches fail in this regime. The SDP and SDP+/ | show nearly identical performance. The 
t x -minimization, however, competes with Algorithm 1 for small values of fc. This observation can be 
explained intuitively by the fact that the £ x -minimization succeeds with n = 0 (fc 2 ) measurements 
which for small values of fc can be sufficiently close to the considered n = 3 [2fc (l + log |)1 
measurements. 


6 



















4 Proofs 


Proof of Theorem 1. Clearly, B * = PX*P T is feasible in 6 because of A3. Therefore, we can 
show that any solution B of (6) accurately estimates B* using existing results on nuclear-norm 
minimization. In particular, we can invoke [5, Theorem 2 and Section 4.3] which guarantees that for 
some positive absolute constants C\, C ' 2 , and C3 if (8) holds then 

C' 2 £ 


B - B 


< 


F 

holds for all valid B* , thereby for all valid X* , with probability exceeding 1 — e _<?3 ". Therefore, 
with C = Cf the target matrix X* would be feasible in (7). Now, it suffices to show that the 
sparse estimation stage can produce an accurate estimate of X* . Recall that by A2, the matrix P 
is restricted isometry for 2fc-sparse vectors. Let X be a matrix that is 2k x 2/,;-sparse, i.e., a matrix 
whose entries except for some 2k x 2k submatrix are all zeros. Applying (4) to the columns of X 
and adding the inequalities yield 


(1 — 82 k) 


X\\f< 


\PX\\ 2 F <(l + S 2k )\\X\ 


( 10 ) 


Because the columns of X ' P' are also 2/e-sparse we can repeat the same argument and obtain 


(1 — 82 k ) 


X T P T 


< 


PX T P T 


Using the facts that 
and (11) imply that 


X T P T 


(i - s 2k y 


= II^XILand 


F 

PX T P T 


< (1 + 82k) 


X T P T 


PXP' 


\X\\f< 


PXP' 


< (l + d 2fe ) 2 ||X|| 


( 11 ) 

F 

, the inequalities (10) 

( 12 ) 


The proof proceeds with an adaptation of the arguments used to prove accuracy of (i)-minimization 
in compressive sensing based on the restricted isometry property (see, e.g., [22]). Let E = X — X*. 
Furthermore, let So C [d] x [d] denote the support set of the k x fc-sparse target X*. Define E 0 to 
be a d x d matrix that is identical to E over the index set So and zero elsewhere. By optimality of 
X and feasibility of X* in (7) we have 


IX* 


li> 


X 


= IIX* 


= 11** 

1 +II-E- 


F E - E 0 
Eoh-l 


E o||r> 


E Eq ||! - 


^0II1 


I E, 


0 1 | 11 


where the last line follows from the fact that X and E — E o have disjoint supports. Thus, we have 

ll^-^olli < ll^olli <fcpo|| F . (13) 

Now consider a decomposition of E — Eq as the sum 

j 

(14) 


e-Eq = Y j e j , 

i=1 

such that for j > 0 the d x d matrices E :j have disjoint support sets of size k x k except perhaps for 
the last few matrices that might have smaller supports. More importantly, the partitioning matrices 
Ej are chosen to have a decreasing Frobenius norm (i.e., ||J5j|| F > ||-E7 )+i|| f ) for j > 1. We have 


1=2 


J 1 J 1 

rEll^-tllr^ tWE-Eq^K llBoll^^llBo + Stllf, (15) 


1=2 


2=2 


< 


where the chain of inequalities follow from the triangle inequality, the fact that \\E ;) 

^2 ||J5j_i||i by construction, the fact that the matrices Ej have disjoint support and satisfy (14), 
the bound (13), and the fact that Eq and FI 1 are orthogonal. Furthermore, we have 


P ( Eq + E 1 )P j = ( (E 0 + Ef) P T , P ( E - Ej 

2=2 




< 


P (Eq + El )'!' 1 


PEP' 


EE I (pE l P J ,PE j P 1 


i =0 2=2 


( 16 ) 
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where the first term is obtained by the Cauchy-Schwarz inequality and the summation is obtained by 
the triangle inequality. Because E = X — X * by definition, the triangle inequality and the fact that 
X* and X are feasible in (7) imply that PEP 1 < PXP 1 B 


PX*P T B 


< 


2 Ce 
y/n 


. Furthermore, Lemma 1 below which is adapted from [22, Lemma 2.1] guarantees that for 


PEjP 1 .PE ,P T 


i £ { 0 , 1 } and j > 2 we have ^ 


(l^S 2 k ) 2 \\E 0 + E 1 \\ 2 F < P(E a + E 1 )P 1 


< 2S 2 k \\E i \\ 


-. Therefore, we obtain 


< 


2 Ce 
y/n 


'E{E 0 + E 1 )V J + 26 2k J2J2^ E ^F\\ E . 


3\\F 


< (1 + 8 2k ) \\Eo + -E'lHj’ + 28 2k H^ill F ||E 

»=o j =2 


*=0 3=2 
1 J 


3 II F 


2 Ce 


<—(1 + S 2k ) ||£? 0 + E^p + 2 S 2k (||£1o|| f + ||£?i|| f ) ||JS 0 + ^i|| F 


/n 


< \\E 0 + E 


ill F 


( 2 Ce 

Wn 


(1 + S 2k ) + 2\[28 2k \\Eq + Ei\ 


where the chain of inequalities follow from the lower bound in ( 12 ), the bound (16), the upper 
bound in ( 12 ), the bound (15), and the fact that ||2?q|| f + ||i5i|| F < \J2 || £7 0 + -Ei|| f - If $ 2 k < 

1 + y/2 ^1 — y/l + v/2^ ~ 0.216, then we have 7 := (1 — S 2k ) 2 — 2y/25 2k > 0 and thus 

2C (1 + 5 2k ) s 


\En 


Ei\\p< 


Adding the above inequality to (13) and applying the triangle then yields the desired result. 


□ 


Lemma 1. Let P be a matrix obeying (4). Then for any pair of k x k-sparse matrices X and X' 
with disjoint supports we have 


PXP 1 ,PX’P T ^ < 28 2k ||X|| F ||a:'| 


F ' 


Proof Suppose that X and X' have unit Frobenius norm. Using the identity 


PXP 1 ,PX'P T ) = \ 


P(X + X') P 1 


P(X-X') P 1 


and the fact that 


X and X' have disjoint supports, it follows from (12) that 

-2 5 2k = ~( 1 + (?2fc ) < (pxP T ,PX'P T ) < ( 1 + J2fc ) -(f-^fc) = 25 ^ 


The general result follows immediately as the desired inequality is homogeneous in the Frobenius 
norms of X and X'. □ 

Lemma 2 (Projected estimator). Let S be a closed nonempty subset of a normed vector space 
(V, ||-||). Suppose that for v* £ S we have an estimator v £ V, not necessarily in S, that obeys 
||S — u*|| < e. Ifv denotes a projection ofv onto S, then we have ||S — n*|| < 2e. 

Proof By definition v £ argmin^gg ||z> — S||. Therefore, because v* £ S we have 
||S - v*|| < IIS - v*\\ + IIS - S|| < 2 ||S - v*\\ < 2e. 


□ 
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